NIL Is Not Nothing: Recognition of Chinese Network Informal Language Expressions

نویسندگان

  • Yunqing Xia
  • Kam-Fai Wong
  • Wei Gao
چکیده

Informal language is actively used in network-mediated communication, e.g. chat room, BBS, email and text message. We refer the anomalous terms used in such context as network informal language (NIL) expressions. For example, “ (ou3)” is used to replace “ (wo3)” in Chinese ICQ. Without unconventional resource, knowledge and techniques, the existing natural language processing approaches exhibit less effectiveness in dealing with NIL text. We propose to study NIL expressions with a NIL corpus and investigate techniques in processing NIL expressions. Two methods for Chinese NIL expression recognition are designed in NILER system. The experimental results show that pattern matching method produces higher precision and support vector machines method higher F-1 measure. These results are encouraging and justify our future research effort in NIL processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Two-Stage Incremental Annotation Approach to Constructing a Network Informal Language Corpus

Network Informal Language (NIL) refers to the special human language widely used in the community of digital network chat via platforms such as chat rooms/tools, mobile phone short message services (SMS), bulletin board systems (BBS), emails, etc. NIL holds anomalous characteristics in forming words, phrases, and non-alphabetical characters. This makes it difficult to handle NIL text by convent...

متن کامل

Pragmatic expressions in cross-linguistic perspective

This  paper  focuses  on  some  pragmatic  expressions  that  are  characteristic  of  informal  spoken English, their possible equivalents in some other languages, and their use by EFL learners from different  backgrounds.  These  expressions,  called  general  extenders  (e.g.  and  stuff,  or something), are shown to be different from discourse markers and to exhibit variation in form, funct...

متن کامل

سیستم شناسایی و طبقه‌بندی موجودیت‌های اسمی در متون زبان فارسی بر پایه شبکه عصبی

Named Entity Recognition (NER) is a fundamental task in natural language processing and also known as a subset of information extraction. We seek to locate and classify named entities in text into predefined categories such as the names of persons, organizations, locations, expressions of times, etc. Named Entity Recognition for English texts has been researched widely for the past years, howev...

متن کامل

Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

We address the problem of informal word recognition in Chinese microblogs. A key problem is the lack of word delimiters in Chinese. We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation, we propose to model the two tasks jointly. Our joint inference method significantly outperforms baseline systems that conduct the t...

متن کامل

Spatial and symbolic recognition of Chinese mosques

The history of Islam in China began when the first ambassador of Islamic caliphate in 654 AD, gained the court of the Chinese emperor. After that Islam has been spread throughout there during a century. In this study, authors try to study about how architectural elements and spatial forms are effected from Islam or Buddhist-Chinese tradition. Then, at the first it must be clear that which symbo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005